```html
<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Chunkr - 智能文档处理工具</title>
    <link href="https://cdn.staticfile.org/font-awesome/6.4.0/css/all.min.css" rel="stylesheet">
    <link href="https://cdn.staticfile.org/tailwindcss/2.2.19/tailwind.min.css" rel="stylesheet">
    <link href="https://fonts.googleapis.com/css2?family=Noto+Serif+SC:wght@400;500;600;700&family=Noto+Sans+SC:wght@300;400;500;700&display=swap" rel="stylesheet">
    <style>
        body {
            font-family: 'Noto Sans SC', Tahoma, Arial, Roboto, "Droid Sans", "Helvetica Neue", "Droid Sans Fallback", "Heiti SC", "Hiragino Sans GB", Simsun, sans-serif;
            color: #1a1a1a;
            background-color: #f8f9fa;
        }
        .hero-gradient {
            background: linear-gradient(135deg, #6e8efb 0%, #4a6cf7 100%);
        }
        .card-hover {
            transition: all 0.3s ease;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
        }
        .card-hover:hover {
            transform: translateY(-5px);
            box-shadow: 0 10px 20px rgba(0, 0, 0, 0.15);
        }
        .feature-icon {
            background: linear-gradient(135deg, #6e8efb 0%, #4a6cf7 100%);
            -webkit-background-clip: text;
            -webkit-text-fill-color: transparent;
        }
        .code-block {
            background-color: #2d3748;
            border-radius: 0.5rem;
        }
        .drop-cap:first-letter {
            float: left;
            font-size: 3.5rem;
            line-height: 2.5rem;
            padding-top: 0.5rem;
            padding-right: 0.5rem;
            font-weight: bold;
            color: #4a6cf7;
        }
    </style>
</head>
<body>
    <!-- Hero Section -->
    <section class="hero-gradient text-white py-20 px-4 md:px-0">
        <div class="container mx-auto max-w-6xl flex flex-col md:flex-row items-center">
            <div class="md:w-1/2 mb-10 md:mb-0">
                <h1 class="text-4xl md:text-5xl font-bold mb-6 leading-tight">重新定义文档处理体验</h1>
                <p class="text-xl md:text-2xl mb-8 opacity-90 leading-relaxed">Chunkr 是一款开源智能文档处理工具，专为AI开发者和数据科学家设计，将复杂文档转化为RAG和LLM可用的结构化数据。</p>
                <div class="flex flex-col sm:flex-row gap-4">
                    <a href="https://github.com/lumina-ai-inc/chunkr" class="bg-white text-blue-600 hover:bg-gray-100 font-semibold py-3 px-6 rounded-lg transition duration-300 flex items-center justify-center">
                        <i class="fab fa-github mr-2"></i> GitHub 仓库
                    </a>
                    <a href="#quick-start" class="bg-transparent border-2 border-white hover:bg-white hover:text-blue-600 font-semibold py-3 px-6 rounded-lg transition duration-300 flex items-center justify-center">
                        <i class="fas fa-rocket mr-2"></i> 快速开始
                    </a>
                </div>
            </div>
            <div class="md:w-1/2 flex justify-center">
                <img src="https://cdn.nlark.com/yuque/0/2025/png/21449790/1754718630915-cedaca82-8fb3-4c74-9170-91c26883cb6a.png" alt="Chunkr 界面预览" class="rounded-lg shadow-2xl max-w-full h-auto border-4 border-white">
            </div>
        </div>
    </section>

    <!-- Problem Statement -->
    <section class="py-16 px-4 bg-white">
        <div class="container mx-auto max-w-5xl">
            <div class="text-center mb-16">
                <h2 class="text-3xl md:text-4xl font-bold mb-4 text-gray-800">它能解决什么问题</h2>
                <div class="w-20 h-1 bg-blue-500 mx-auto"></div>
            </div>
            
            <div class="grid md:grid-cols-3 gap-8">
                <div class="bg-gray-50 p-6 rounded-xl card-hover">
                    <div class="text-blue-500 text-4xl mb-4">
                        <i class="fas fa-file-pdf"></i>
                    </div>
                    <h3 class="text-xl font-semibold mb-3 text-gray-800">复杂文档解析</h3>
                    <p class="text-gray-600">PDF、表格等复杂文档难以提取结构化信息，OCR精度低或格式混乱。</p>
                </div>
                
                <div class="bg-gray-50 p-6 rounded-xl card-hover">
                    <div class="text-blue-500 text-4xl mb-4">
                        <i class="fas fa-cogs"></i>
                    </div>
                    <h3 class="text-xl font-semibold mb-3 text-gray-800">缺乏灵活性</h3>
                    <p class="text-gray-600">现有工具无法满足特定场景需求，如高精度表格解析。</p>
                </div>
                
                <div class="bg-gray-50 p-6 rounded-xl card-hover">
                    <div class="text-blue-500 text-4xl mb-4">
                        <i class="fas fa-project-diagram"></i>
                    </div>
                    <h3 class="text-xl font-semibold mb-3 text-gray-800">工程复杂度高</h3>
                    <p class="text-gray-600">自建文档处理管道耗时，涉及布局分析、语义分块等复杂工程问题。</p>
                </div>
            </div>
            
            <div class="mt-12 bg-blue-50 p-6 rounded-xl border-l-4 border-blue-500">
                <p class="drop-cap text-lg text-gray-700">Chunkr就是来解决这些问题的。它提供高精度OCR、语义分块和多格式输出，简化文档处理流程，让开发者专注于AI应用开发而非底层管道搭建。</p>
            </div>
        </div>
    </section>

    <!-- Core Features -->
    <section class="py-16 px-4 bg-gray-50">
        <div class="container mx-auto max-w-5xl">
            <div class="text-center mb-16">
                <h2 class="text-3xl md:text-4xl font-bold mb-4 text-gray-800">核心功能概述</h2>
                <p class="text-xl text-gray-600 max-w-3xl mx-auto">Chunkr提供以下核心功能，优化文档处理体验</p>
                <div class="w-20 h-1 bg-blue-500 mx-auto mt-4"></div>
            </div>
            
            <div class="grid md:grid-cols-2 gap-8">
                <div class="bg-white p-6 rounded-xl shadow-sm card-hover">
                    <div class="flex items-start">
                        <div class="feature-icon text-2xl mr-4 mt-1">
                            <i class="fas fa-search-plus"></i>
                        </div>
                        <div>
                            <h3 class="text-xl font-semibold mb-2 text-gray-800">高精度OCR与布局分析</h3>
                            <p class="text-gray-600">识别文档中的文本、表格、公式等，生成带边界框的结构化数据。</p>
                        </div>
                    </div>
                </div>
                
                <div class="bg-white p-6 rounded-xl shadow-sm card-hover">
                    <div class="flex items-start">
                        <div class="feature-icon text-2xl mr-4 mt-1">
                            <i class="fas fa-puzzle-piece"></i>
                        </div>
                        <div>
                            <h3 class="text-xl font-semibold mb-2 text-gray-800">语义分块</h3>
                            <p class="text-gray-600">将文档分解为语义完整的片段，优化RAG和LLM输入。</p>
                        </div>
                    </div>
                </div>
                
                <div class="bg-white p-6 rounded-xl shadow-sm card-hover">
                    <div class="flex items-start">
                        <div class="feature-icon text-2xl mr-4 mt-1">
                            <i class="fas fa-file-export"></i>
                        </div>
                        <div>
                            <h3 class="text-xl font-semibold mb-2 text-gray-800">多格式输出</h3>
                            <p class="text-gray-600">支持HTML、Markdown、JSON和纯文本输出，适配多种下游任务。</p>
                        </div>
                    </div>
                </div>
                
                <div class="bg-white p-6 rounded-xl shadow-sm card-hover">
                    <div class="flex items-start">
                        <div class="feature-icon text-2xl mr-4 mt-1">
                            <i class="fas fa-eye"></i>
                        </div>
                        <div>
                            <h3 class="text-xl font-semibold mb-2 text-gray-800">视觉模型支持</h3>
                            <p class="text-gray-600">集成VLM（视觉语言模型），处理复杂元素如图表和手写内容。</p>
                        </div>
                    </div>
                </div>
                
                <div class="bg-white p-6 rounded-xl shadow-sm card-hover md:col-span-2">
                    <div class="flex items-start">
                        <div class="feature-icon text-2xl mr-4 mt-1">
                            <i class="fas fa-server"></i>
                        </div>
                        <div>
                            <h3 class="text-xl font-semibold mb-2 text-gray-800">自托管与API</h3>
                            <p class="text-gray-600">提供Docker和Kubernetes部署选项，以及Python SDK，便于集成。</p>
                        </div>
                    </div>
                </div>
            </div>
            
            <div class="mt-12 bg-white p-6 rounded-xl border-l-4 border-blue-500">
                <h3 class="text-xl font-semibold mb-4 text-gray-800 flex items-center">
                    <i class="fas fa-lightbulb text-yellow-500 mr-2"></i>
                    <span>思考</span>
                </h3>
                <p class="text-gray-700">你的项目是否需要处理大量PDF或表格数据？Chunkr的语义分块能否提升你的RAG系统性能？</p>
            </div>
        </div>
    </section>

    <!-- Use Cases -->
    <section class="py-16 px-4 bg-white">
        <div class="container mx-auto max-w-5xl">
            <div class="text-center mb-16">
                <h2 class="text-3xl md:text-4xl font-bold mb-4 text-gray-800">使用场景</h2>
                <p class="text-xl text-gray-600">以下是Chunkr的典型应用场景</p>
                <div class="w-20 h-1 bg-blue-500 mx-auto mt-4"></div>
            </div>
            
            <div class="grid md:grid-cols-3 gap-8">
                <div class="bg-gray-50 p-6 rounded-xl card-hover">
                    <div class="text-blue-500 text-2xl mb-3">
                        <i class="fas fa-graduation-cap"></i>
                    </div>
                    <h3 class="text-xl font-semibold mb-3 text-gray-800">学术研究数据提取</h3>
                    <p class="text-gray-600">研究人员处理大量学术PDF，Chunkr可提取标题、表格和引用，生成结构化Markdown，加速文献分析。</p>
                </div>
                
                <div class="bg-gray-50 p-6 rounded-xl card-hover">
                    <div class="text-blue-500 text-2xl mb-3">
                        <i class="fas fa-building"></i>
                    </div>
                    <h3 class="text-xl font-semibold mb-3 text-gray-800">企业知识库构建</h3>
                    <p class="text-gray-600">企业将合同、报告等Word文档转为JSON格式，Chunkr的语义分块帮助构建高效检索系统。</p>
                </div>
                
                <div class="bg-gray-50 p-6 rounded-xl card-hover">
                    <div class="text-blue-500 text-2xl mb-3">
                        <i class="fas fa-chart-line"></i>
                    </div>
                    <h3 class="text-xl font-semibold mb-3 text-gray-800">财务数据处理</h3>
                    <p class="text-gray-600">金融团队解析Excel或PDF中的复杂表格，Chunkr提供高精度OCR和表格提取，简化数据分析。</p>
                </div>
            </div>
        </div>
    </section>

    <!-- Advantages -->
    <section class="py-16 px-4 bg-gray-50">
        <div class="container mx-auto max-w-5xl">
            <div class="text-center mb-16">
                <h2 class="text-3xl md:text-4xl font-bold mb-4 text-gray-800">优势与特色</h2>
                <p class="text-xl text-gray-600 max-w-3xl mx-auto">相比其他文档处理工具（如Tesseract或商业API），Chunkr的独特之处</p>
                <div class="w-20 h-1 bg-blue-500 mx-auto mt-4"></div>
            </div>
            
            <div class="grid md:grid-cols-2 gap-8">
                <div class="bg-white p-6 rounded-xl border border-gray-200 card-hover">
                    <div class="flex items-start">
                        <div class="text-blue-500 text-2xl mr-4 mt-1">
                            <i class="fab fa-osi"></i>
                        </div>
                        <div>
                            <h3 class="text-xl font-semibold mb-2 text-gray-800">开源灵活性</h3>
                            <p class="text-gray-600">基于AGPL-3.0许可，允许自托管，减少供应商锁定，适合定制化需求。</p>
                        </div>
                    </div>
                </div>
                
                <div class="bg-white p-6 rounded-xl border border-gray-200 card-hover">
                    <div class="flex items-start">
                        <div class="text-blue-500 text-2xl mr-4 mt-1">
                            <i class="fas fa-tachometer-alt"></i>
                        </div>
                        <div>
                            <h3 class="text-xl font-semibold mb-2 text-gray-800">高性能</h3>
                            <p class="text-gray-600">使用Rust开发，单RTX 4090可处理4页/秒，适合大规模文档处理。</p>
                        </div>
                    </div>
                </div>
                
                <div class="bg-white p-6 rounded-xl border border-gray-200 card-hover">
                    <div class="flex items-start">
                        <div class="text-blue-500 text-2xl mr-4 mt-1">
                            <i class="fas fa-cubes"></i>
                        </div>
                        <div>
                            <h3 class="text-xl font-semibold mb-2 text-gray-800">模块化配置</h3>
                            <p class="text-gray-600">支持自定义VLM、OCR和分块策略，满足多样化需求（如特定页面高分辨率裁剪）。</p>
                        </div>
                    </div>
                </div>
                
                <div class="bg-white p-6 rounded-xl border border-gray-200 card-hover">
                    <div class="flex items-start">
                        <div class="text-blue-500 text-2xl mr-4 mt-1">
                            <i class="fas fa-rocket"></i>
                        </div>
                        <div>
                            <h3 class="text-xl font-semibold mb-2 text-gray-800">生产就绪</h3>
                            <p class="text-gray-600">提供Docker、Helm图表和API，简化生产环境部署。</p>
                        </div>
                    </div>
                </div>
            </div>
            
            <div class="mt-8 bg-blue-50 p-6 rounded-xl">
                <h3 class="text-lg font-semibold mb-3 text-gray-800 flex items-center">
                    <i class="fas fa-exclamation-triangle text-yellow-500 mr-2"></i>
                    <span>局限性</span>
                </h3>
                <p class="text-gray-700">自托管需要一定技术背景，配置LLM或GPU环境可能对新手有挑战。免费API试用受限，需付费获取更高配额。</p>
            </div>
        </div>
    </section>

    <!-- Quick Start -->
    <section id="quick-start" class="py-16 px-4 bg-white">
        <div class="container mx-auto max-w-5xl">
            <div class="text-center mb-16">
                <h2 class="text-3xl md:text-4xl font-bold mb-4 text-gray-800">上手指南</h2>
                <p class="text-xl text-gray-600 max-w-3xl mx-auto">快速使用Chunkr的步骤</p>
                <div class="w-20 h-1 bg-blue-500 mx-auto mt-4"></div>
            </div>
            
            <div class="mb-12">
                <h3 class="text-2xl font-semibold mb-6 text-gray-800 flex items-center">
                    <span class="bg-blue-500 text-white rounded-full w-8 h-8 flex items-center justify-center mr-3">1</span>
                    <span>云端API使用</span>
                </h3>
                
                <div class="grid md:grid-cols-2 gap-6">
                    <div class="bg-gray-50 p-6 rounded-lg">
                        <h4 class="text-lg font-semibold mb-3 text-gray-800 flex items-center">
                            <i class="fas fa-cloud text-blue-500 mr-2"></i>
                            <span>获取API密钥</span>
                        </h4>
                        <p class="text-gray-600 mb-4">访问 <a href="https://chunkr.ai" class="text-blue-500 hover:underline" target="_blank">chunkr.ai</a>，注册账号并获取API密钥。</p>
                        
                        <h4 class="text-lg font-semibold mb-3 text-gray-800 flex items-center mt-4">
                            <i class="fas fa-terminal text-blue-500 mr-2"></i>
                            <span>安装SDK</span>
                        </h4>
                        <div class="code-block p-4 mb-4">
                            <code class="text-gray-200 font-mono">pip install chunkr-ai</code>
                        </div>
                    </div>
                    
                    <div class="bg-gray-900 rounded-lg overflow-hidden">
                        <div class="p-4 bg-gray-800 flex items-center">
                            <div class="flex space-x-2 mr-4">
                                <span class="w-3 h-3 rounded-full bg-red-500"></span>
                                <span class="w-3 h-3 rounded-full bg-yellow-500"></span>
                                <span class="w-3 h-3 rounded-full bg-green-500"></span>
                            </div>
                            <div class="text-sm text-gray-400">示例代码</div>
                        </div>
                        <div class="p-4 overflow-x-auto">
                            <pre class="text-gray-300 font-mono text-sm">
<span class="text-blue-400">from</span> chunkr_ai <span class="text-blue-400">import</span> Chunkr
chunkr = Chunkr(api_key=<span class="text-green-400">"your_api_key"</span>)
url = <span class="text-green-400">"https://chunkr-web.s3.us-east-1.amazonaws.com/landing_page/input/science.pdf"</span>
task = chunkr.upload(url)
markdown = task.markdown(output_file=<span class="text-green-400">"output.md"</span>)
chunkr.close()</pre>
                        </div>
                    </div>
                </div>
            </div>
            
            <div class="mb-12">
                <h3 class="text-2xl font-semibold mb-6 text-gray-800 flex items-center">
                    <span class="bg-blue-500 text-white rounded-full w-8 h-8 flex items-center justify-center mr-3">2</span>
                    <span>自托管部署</span>
                </h3>
                
                <div class="grid md:grid-cols-2 gap-6">
                    <div class="bg-gray-50 p-6 rounded-lg">
                        <h4 class="text-lg font-semibold mb-3 text-gray-800 flex items-center">
                            <i class="fab fa-docker text-blue-500 mr-2"></i>
                            <span>安装依赖</span>
                        </h4>
                        <p class="text-gray-600 mb-4">安装Docker和Docker Compose。</p>
                        
                        <h4 class="text-lg font-semibold mb-3 text-gray-800 flex items-center mt-4">
                            <i class="fas fa-code-branch text-blue-500 mr-2"></i>
                            <span>克隆仓库</span>
                        </h4>
                        <div class="code-block p-4 mb-4">
                            <code class="text-gray-200 font-mono">git clone https://github.com/lumina-ai-inc/chunkr</code>
                        </div>
                    </div>
                    
                    <div class="bg-gray-50 p-6 rounded-lg">
                        <h4 class="text-lg font-semibold mb-3 text-gray-800 flex items-center">
                            <i class="fas fa-cog text-blue-500 mr-2"></i>
                            <span>配置环境</span>
                        </h4>
                        <div class="code-block p-4 mb-4">
                            <code class="text-gray-200 font-mono">cp .env.example .env</code>
                        </div>
                        <p class="text-gray-600 mb-4">并编辑LLM设置。</p>
                        
                        <h4 class="text-lg font-semibold mb-3 text-gray-800 flex items-center mt-4">
                            <i class="fas fa-play text-blue-500 mr-2"></i>
                            <span>启动服务</span>
                        </h4>
                        <div class="code-block p-4">
                            <code class="text-gray-200 font-mono">docker compose up -d</code>
                        </div>
                    </div>
                </div>
                
                <div class="mt-6 bg-blue-50 p-6 rounded-xl">
                    <h4 class="text-lg font-semibold mb-3 text-gray-800 flex items-center">
                        <i class="fas fa-globe text-blue-500 mr-2"></i>
                        <span>访问服务</span>
                    </h4>
                    <p class="text-gray-700">Web UI: <a href="http://localhost:5173" class="text-blue-500 hover:underline" target="_blank">http://localhost:5173</a></p>
                    <p class="text-gray-700">API: <a href="http://localhost:8000" class="text-blue-500 hover:underline" target="_blank">http://localhost:8000</a></p>
                </div>
            </div>
            
            <div class="bg-yellow-50 p-6 rounded-xl border-l-4 border-yellow-400">
                <h3 class="text-lg font-semibold mb-3 text-gray-800 flex items-center">
                    <i class="fas fa-lightbulb text-yellow-500 mr-2"></i>
                    <span>提示</span>
                </h3>
                <p class="text-gray-700">初次使用建议从云端API开始，测试小规模文档处理效果，再考虑自托管。</p>
            </div>
        </div>
    </section>

    <script src="https://cdn.jsdelivr.net/npm/mermaid@latest/dist/mermaid.min.js"></script>
    <script>
        mermaid.initialize({
            startOnLoad: true,
            theme: 'default',
        });
    </script>
</body>
</html>
```